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Abstract 

Drift analysis has become a powerful tool to prove bounds on the 
runtime of randomized search heuristics. It allows, for example, fairly 
simple proofs for the classical problem how the (1+1) Evolutionary Al- 
gorithm (EA) optimizes an arbitrary pseudo-Boolean linear function. 
The key idea of drift analysis is to measure the progress via another 
pseudo-Boolean function (called drift function) and use deeper results 
from probability theory to derive from this a good bound for the run- 
time of the EA. Surprisingly, all these results manage to use the same 
drift function for all linear objective functions. 

In this work, we show that such universal drift functions only exist 
if the mutation probability is close to the standard value of 1/n. 



1 Introduction 

An innocent looking problem is the question how long the (1+1) pseudo- 
Boolean Algorithm ((1+1) EA) needs to find the optimum of a given linear 
function. However, this is in fact one of the problems that was most influ- 
ential for the theory of evolutionary algorithms. 

While particular linear functions like OneMax were easily analyzed, it 
took a major effort by Droste, Jansen and Wegener [DJW02] to solve the 
problem in full generality and to show that the (1+1) EA optimizes any 
linear function in 0{n\ogn) steps. Their proof of the result, however, is 
highly technical. 

A major breakthrough spurred by this problem is the work by He and 
Yao [HY01,HY03], who introduced drift analysis to the field of evolution- 
ary computation. This allowed a significantly simpler proof for the linear 
functions problem. Even more important, drift analysis quickly became one 
of the most powerful tools for both proving upper and lower bounds on the 
runtime of evolutionary algorithms. See, e.g., [HY03,GL06,HJKN08,OW10, 
HelO]. 
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In a nut-shell, the drift analysis conducted by He and Yao is a potential 
function argument. For a suitable potential function (usually called drift 
function), they show that in each iteration of a run of the (1+1) EA opti- 
mizing the original linear function, also a certain improvement with respect 
to the drift function is obtained. By this, stopping-time arguments which 
were difficult to obtain for the original function can be replaced by such 
arguments for the drift function. 

However, the proof given by He and Yao [HY01,HY03] is still not easy. 
The difficulties include both finding a suitable drift function and proving 
that this function has a positive drift in every search point. 

Another great progress was made by Jagerskiipper [Jag08], who used a 
clever averaging argument avoiding the need to show that from each search 
point on there is a positive drift. In consequence, he was able to use as drift 
function the natural OneMax function, which simply counts the number 
of 1-bits in the bit string. This also allowed to determine reasonable values 
for the usually not explicitly given constants. More precisely, Jagerskiipper 
showed that the expected optimization time for any linear function defined 
on n-bit strings is bounded by (1 + o(l))2.02en ln(n). 

In [DJWIOb], a multiplicative drift theorem was proposed. It allows 
a simpler and more natural proof of the O(nlogn) bound. By combining 
Jagerskiipper's arguments from [Jag08] with the multiplicative drift theo- 
rem, the authors improved his upper bound to (1 + o(l))1.39en ln(n). 

Interestingly, in each of these results the same drift function could be 
used for all linear objective functions. We call such a drift function universal. 

Our Results 

In this work, we show that the existence of universal drift functions depends 
highly on the mutation probability. If this is larger than the standard value 
of 1/n by more than a certain constant factor, universal drift functions do 
not exist. In consequence, it is not clear how to extend previous results to 
large mutation probabilities. 

We show that the (1 + 1) EA C does not allow linear universal drift func- 
tions even for relatively small values of c. More precisely, we show that the 
classical, additive drift method by He and Yao does not allow universal drift 
functions for values of c larger than 4. The multiplicative drift method does 
not allow linear universal drift functions for values of c greater then 2.2. 
Lastly, we show that even if we combine the Jagerskiipper approach with 
the multiplicative drift method, linear universal drift functions do not exist 
if the mutation probability exceeds 7/n. 
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2 Optimizing Linear Functions with the (1+1) EA 



Throughout this paper, we are interested in the performance of different 
variants of the (1+1) EA on the class of linear functions. To be more 
precise, we are interested in the proof techniques which allow us to show a 
certain behavior of the algorithms on this class of functions. 

The aim of this section is to give a very brief introduction to the class 
of linear functions and to the algorithms under consideration. 

Before we move on, here is some notation which we use throughout the 
work. By N we denote the set of positive integers and, accordingly, we set 
No := N U {0}. If no further comments are made, n will always denote the 
length of the input, i.e., in our case, the length of the bit strings in the 
considered search space. For convenience, we write [n] for N< n ,. A bit string 
x G {0, l} n is denoted by x n . . .x\. This notation is inspired by the function 
BinVal (which will be defined below). It allows us to use the standard 
notation for binary representation of natural numbers. 

For every i G N<„ let ei G {0, l} n be the i-th unit vector, i.e., {ei)j = 1 if 
and only if j = i. By © we denote the bitwise XOR operation on bit strings, 
i.e., for all x, y G {0, l} n we have (x © y)i = 1 if and only if x-i ^ yi. For a 
stochastic event A, we denote by x(A) the characteristic function of A, i.e., 
x(A) = 1 if A occurs, and x(A) = otherwise. 

2.1 Linear Functions 

Definition 1 (Linear Functions). Let n G N. A function f : {0, l} n — > M. is 
called linear if there exist weights w±,..., w n G M such that for all x G {0, l} n 
it holds that f(x) = Yl r j=\ w j x j- The class of linear functions Linear is 
defined as the set of all such functions, i.e., 

n 

Linear := {/ : {0, 1}™ ->• R,x i-> V^m,-^ | w\, . . . ,w n G R} . (1) 

3=1 

We say that / has monotone weights if W\ < • • • < w n . It is easy to see 
(and has been argued in [DJW02]) that when analyzing how the (1+1) EA 
optimizes a linear function we can assume monotone weights without loss 
of generality. Furthermore, we always assume w\ > 0, again not restricting 
the generality of the results. 

For the purpose of better legibility, we ignore the dilemma of using / as 
name for the function itself and for its weights and write f(x) = Y?j=i fj x j 
and, similarly, g{x) = Y^j=i 9j x j f° r nnear functions / and g. 

Also note that in this paper we are interested in the number of iter- 
ations it takes to minimize a given linear function. Note, however, that 
the optimization time bounds obtained in this work are the same for the 
minimization and the maximization problem. 
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1 Choose x^ € {0, l} n uniformly at random; 

2 for t = 1 to oo do 

3 Sample € {0, l} n by flipping each bit in x^* -1 ) with 



probability c/n; 

4 if /(y(*)) < /(x^" 1 )) then 

5 [_ x« := yW 



6 else 



7 := x^ 1 ) 



Algorithmus 1: (1 + 1) EA C : The (1+1) Evolutionary Algorithm with mutation 
rate c/n for minimizing /: {0, 1}" — >• R. 

In the following discussions, two linear functions will play an impor- 
tant role. The first one, the so-called OneMax function, simply counts 
the number of ones in the bit string, i.e., OneMax(x) = Y^j=i x i- We 
shall often abbreviate OneMax(i) by |x|i, in particular if the space is lim- 
ited. The second function of particular interest is BinVal. It is defined 
by BinVal(x) = Yll=i 2 J %j- We will discuss some properties of these two 
functions in the next subsection. 

2.2 The (1+1) Evolutionary Algorithm 

Our main interest in this work is to show that certain analysis techniques for 
the (1+1) Evolutionary Algorithm fail if we change the mutation probability 
only by a constant factor. We denote by (1+1) EA C the (1+1) EA where 
the standard mutation rate of p = 1/n is changed to p = c/n, where c is 
some positive constant (cf. Algorithm 1). 

Let us comment on some features of the (1+1) EA C as described in 
Algorithm 1. It starts with a randomly chosen initial bit string x^°\ Thus, 
on average, we expect n/2 bits to be and the other half to equal 1. In each 
iteration t > 1 the (1+1) EA performs two steps. 

The mutation step can be described as follows: The algorithm creates 
a random vector € {0, l} n such that PrfKj = 1] = c/n mutually 

independent for all i. Then, x^ -1 ^ = x 

(t-l) y (i) is th g 

new candidate for 

the next search point. 

In the selection step, the algorithm ensures that x^ -1 ^ is accepted as a 
new search point only if it is at least as good as the current solution, i.e., 



The expected number of iterations T until the (1+1) EA C selects for the 
first time a bit string x such that f(x) is minimal is called the optimization 





otherwise. 
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time of the (1+1) EA C . Note that one iteration consists of exactly one 
mutation and one selection step. 

Let us consider the behavior of the (1+1) EA, on our two example func- 
tions, OneMax and BinVal. As OneMax simply counts the number of 
Is in the bit string, the the following holds. If during one iteration of the 
(1+1) EA the string y is created from x, y is accepted as a new search point 
for the next iteration if and only if |y|i < \x\i. 

The situation is different for the second example function, BinVal. 
When optimizing this function, the inequality 2* > X^}=i ^ implies that 
the algorithm accepts a new bit string if and only if the highest-indexed bit 
that is touched by the mutation is flipped from one to zero. 

In spite of this different behavior, Droste, Jansen and Wegener could 
prove in their seminal paper [DJW02] that the expected optimization time 
of the standard (1+1) EA (with mutation probability p = 1/n) is of the 
order (nlogra) for all linear functions. 

Theorem 2 ([DJW02]). The expected optimization time of the standard 
(1+1) EA on any linear function with positive weights is B(nlogn). 

A more precise upper bound of (1 + o(l))2.02enlnn was provided by 
Jagerskiipper [Jag08]. In [DJWIOa], the authors of this paper improved 
the bound to (1 + o(l))1.39en In n, together with a lower bound of (1 — 
o(l))enlnn. A discussion of the proof methods is given in the next section. 

3 Drift Analysis 

Drift analysis has been introduced to the field of evolutionary computa- 
tion by He and Yao [HY01,HY03]. The method builds on a result of Ha- 
jek [Haj82]. The main idea of He and Yao is the following. When analyz- 
ing the optimization behavior of a randomized search heuristic, instead of 
tracking how the objective function becomes better, one uses an auxiliary 
function, the so-called potential or drift function and tracks its behavior. 
The drift function is typically designed in such a way that it is minimal if 
and only if the objective function itself is minimized. We give an example 
after the formal description of the method. 

3.1 Additive Drift 

The following additive drift theorem was introduced to the field of evolu- 
tionary computation by He and Yao. 

Theorem 3 (Additive Drift Theorem [HY04]). Let {Z®} te ^ be random 
variables describing a Markov process over a finite state space SCR. Let T 
be the random variable that denotes the earliest point in time t G No such 
that Z® < 0. 

If there exist 5 > and c > such that 
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(%) E[Z(t) - Z(t + l)\t < T] > 5 and 
(ii) z(°) < c. 
Then E[T] < § . 

The idea of applying this theorem to the analysis of the (1+1) EA is 
as follows. Given a function / and a mutation probability p = c/n, let us 
denote by {x^}teN the (random) series of the search points (after selection) 
of one run of the (1+1) EA C . We now try to find another function g such 
that 

(a) {x | f{x) minimal} = {x \ g(x) minimal} and 

(b) {Z^}teN := {9( x ^)}t£No fulfills the requirements of Theorem 3. 

The drift theorem then provides an upper bound for the expected time 
needed by the (1+1) EA C to minimize g. Condition (a) ensures that the 
same upper bound holds for / as well. 

Condition (b) is typically a little tricky to prove. It requires that, given 
some x € {0, l} n , we can expect, on average, that g becomes smaller when- 
ever / does. That is, g is drifting towards the same direction as the objective 
function / itself. That is why we call g a drift function for /. 

Let us give a short example. When using the (1+1) EAi 1 to minimize 
a pseudo-Boolean linear function /: {0, l} n — > R, x i— )• Y^j=i fj x j with arbi- 
trary positive weights < f± < ■ ■ ■ < f n , the drift function can be chosen as 
g: {0, 1}" -> R,x M- In (l + x . + ^ =L „ /2J+1 2 x 3 ) . 

Though still needing some calculations, one can show the following. If 
y is the result of one iteration (mutation and selection) of the (1+1) EA 
starting in some non-optimal x € {0, l} n , then 

E[g(x)-g(y)]>5/n (2) 

for some 5 > 0. The application of Theorem 3 yields that after an expected 
number of g(x)/(5/n) = 0(n log n) iterations, our initial g-value of g(x) is 
reduced to 0. But g(y) = implies f(y) = 0, that is, the (1+1) EA has 
found the desired optimum of / after O(nlogn) iterations. 

3.2 Multiplicative Drift 

Using drift analysis usually bears two difficulties. The first is guessing a 
suitable drift function g. The second, related to the first, is proving that 

1 In fact, He and Yao analyzed a variant of the (1+1) EA presented here. In this variant, 
a candidate search point is only accepted if it is strictly better than the current optimum. 
However, the results for upper bounds in this work can easily be transfered to the setting 
of [HY01,HY03]. 
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during a run of the (1+1) EA, / and g behave sufficiently similar, that is, 
we can prove some statement like inequality (2). Note that this inequality 
contains information about / as well, namely implicitly in the fact that y 
has an at least as good /-value as x. 

What makes showing that g (as chosen in Subsection 3.1) is a suitable 
drift function particularly costly, is the logarithm around the simple linear 

function g(x) = Y^? x j + Y!j=\n/2\+i 2 x r 

This motivated the authors to formulate a different, multiplicative ver- 
sion of He and Yao's drift method in [DJWIOb]. Although the method can 
easily be derived from the additive version, it has been shown to be a very 
natural and elegant way for proving upper bounds on the optimization time 
of randomized search heuristics on different classes of problems. 

Theorem 4 (Multiplicative Drift Theorem [DJWIOb]). Let S C R + be a 

finite set with minimum s m i n := min{s € S}. Let {X^} teN be a sequence 
of random variables over S U{0}. LetT be the random variable that denotes 
the first point in time t 6 N for which =0. 

Suppose that there exists a constant 5 > such that 

E [X® - A(' +1 ) I X® = s,T>t]>5s (3) 

holds for all s € 5 such that Pr[X^ = s,T > t] > 0. Then, for all sq G S, 

e It I a(°) = So ] < 1 + ln( ? /Smin) . 

o 

Note that, whenever g is a drift function for some function / in the sense 
that 

(i) {x | f(x) minimal} = {x \ g(x) minimal} and 

(ii) {Xw}^gN := {g(x^)}ten fulfills the requirements of Theorem 4, 

the function ln(l + g) is a drift function in the classical sense of Section 3.1. 
The opposite direction is not true. I.e., if g is a linear function such that 
ln(l + g) is a drift function for / in the sense of Theorem 3, one cannot 
conclude that g itself is a drift function in the multiplicative setting of The- 
orem 4. However, the analyses carried out in Section 4 can be transferred 
to the additive setting, as shown in Subsection 4.2. 

Let us note that it has been shown in [DJWIOb] that the multiplicative 
drift theorem allows a fairly simple proof for Theorem 2. There, the drift 
function is chosen to be 3: {0, l} n — > R, x i-» Y^j=i9j x i whh gj = 1 for 
j < n/2 and gj = 5/4 otherwise. 
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3.3 Distribution-based Multiplicative Drift 

One may ask why, in the additive setting, not to take g(x) := ln(l + 
OneMax(x)) as potential function. However, an easy observation reveals 
that there is an objective function / and a search point x such that g yields 
to small a drift with respect to /. To see this, let x := e n = (1, 0, . . . , 0) and 
let / := BinVal be the function to be optimized. Then the point-wise drift 
(2) of g is only of order 0(l/n 2 ). This example shows that finding a drift 
function yielding point-wise drift for all x and all / is not so easy. 

The first to overcome the difficulties of point-wise drift was Jagers- 
kiipper [Jag08]. While he still avoids completely analyzing the actual distri- 
bution of x' 4 ' , he does show the following property of this distribution which 
in turn allows him to use a distibution-based drift approach. In this way, 
he omits the need for point-wise drift. Jagerskiipper's simple observation is 
that at any time step t, the more valuable bits are more likely to be in the 
desired setting. 

Theorem 5 ([Jag08]). Let n £ N and let denote the random individual 
(distributed over {0, l} n ) after t 6 No iterations of the (1+1) EA\ minimiz- 
ing a linear function f : {0, l} n — > R. Then, 

Pt[x^ =()]<■■■< Pt[x® = 0]. 
This statement remains true if we condition on |a;^|i = k for any k G [n]. 

Using this theorem, Jagerskiipper was able to show a lower bound 
of 0(1 /n) for the multiplicative drift of OneMax as potential function for 
any linear function. 

Proposition 6. Let n G N, let f: {0,1}" — > No be linear and let g := 
OneMax. Let be the individual in the t-th iteration of the (1+1) EA\ 
minimizing f . Then, 

E\g(xW) - g(x^) | g(x®) = k] > ^~^ k . 

en 

holds for all k G N and t G No with Pi\g(x®) = k] > 0. 

Proposition 6 shows that it is even possible to take OneMax as a drift 
function if we consider the (1 + 1) EAi. Using this approach, Jagerskiipper 
was not only able to give a more natural proof for the 0(n log n) bound 
of the (1+1) EAi on the class of linear functions, but he could also give a 
meaningful upper bound on the leading constant. More precisely, he shows 
that the expected optimization time of the (1+1) EA minimizing a linear 
function on n bits is at most (1 + o(l))2.02enln(n). 
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4 Non-Existence of Linear Universal Drift Func- 
tions 

In the previous section, we have seen for the different drift methods that, if 
we are considering the standard (1+1) EAi with mutation probability 1/n, 
we are able to define a linear function g such that g (or ln(l+<7), respectively) 
serves as a good drift function for all linear functions, independently of the 
particular weights. In the following, we call such functions g linear universal 
drift functions. We give a more precise definition below. 

In this section we prove the main results of this paper, namely that 
universal linear drift functions with monotone weights do not exist if the 
mutation probability exceeds c/n for some small, setting-dependent constant 
c. 

Before we formulate the theorems, let us introduce the operator A c which 
measures the progress made by the (1+1) EA C on / with respect to some 
other function g. 

Definition 7 (A c (g, /, x)). Let Y 6 {0, l} n be randomly chosen such 
that Pr[YJ = 1] = c/n mutually independent for all i € [n]. For f and 
g: {0, l} n — > R and for x € {0, l} n we define the random variable A c (g,f,x) 
by 

A c (g, /, x) := (g(x) - g(x © Y)) • X (/(x © Y) < f(x)) . 

If we are considering the multiplicative setting from Subsections 3.2 and 
3.3, we say that g is a linear universal drift function, if g itself is linear and 
if A c (<7, f,x) > for all linear functions / with monotone weights and all 
possible search points x S {0, l} n . When we consider the additive setting 
from 3.1, the second condition translates to A c (ln(l + g), /, x) > 0. 

A definition for the distribution-based setting of Jagerskiipper will be 
given in Subsection 4.3. 

4.1 Multiplicative Setting 

We first show the non-existence result for the multiplicative setting. Intu- 
itively speaking, it tells us that linear universal drift functions do not exist 
if the mutation probability is larger than 2.2/n. We then present in the next 
subsection how this result can be transferred to the setting of the additive 
drift theorem. 

Theorem 8 (Nonexistence Theorem for Multiplicative Drift). Let n € N be 

sufficiently large and let c > 2.2. If we consider the (1+1) EA C , the following 
statement holds. For every linear function g: {0,1}" -+ R, x i— > Y^=i 9j x j 
with 1 = gi < • • • < g n , there exist a linear function f with monotone 
weights and a bit string x G {0, 1}" such that E[A c (g, /, x)\ < 0. 
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We prove the theorem by contraposition. To this end, let n £ N be 
sufficiently large and let us assume that there exists a universal linear drift 
function g with 1 = g± < ■ ■ - < g n . That is, for every linear function / 
with monotone weights and every x G {0, l} n we have E[A c (g, /, x)] > 0. It 
suffices to show that c cannot be larger than 2.2. 

The proof is structured as follows: In Proposition 9 and Corollary 10 we 
derive lower bounds for Y27=i 9j- An u PP er bound is given in Proposition 11. 
The combination of the three results will conclude the proof. 

Proposition 9. // we consider the (1+1) EA C for some constant c and if 
g is a universal linear drift function with monotone weights, then for all 
1 < i < n we have gi>^ Y?j=i 9j ■ 

Proof. Let / = BinVal, i € [n] and x = e^. Let A be the event that the ith 
bit is the smallest indexed flipping bit. Formally, let Y € {0, l} n be vector 
indicating which bits are being flipped, i.e., Y{ = 1 if and only if the z-th bit 
X{ of x is flipped. Then event A happens if and only is Yi = 1 and Yj = for 
all j > i. Clearly, A expresses the event that x © Y is accepted as the new 
search point and x @Y ^ x. That is, x{f( x ®Y) < f(x)) = x(.A). Thus, 

E[A c (g, f, x)] = E[g(x) - g(x © Y) \ A] ■ PrL4] . 

It is easy to verify that 

PrL4] = f(l 

which is strictly positive. From < E[A c (g, /, x)] we conclude 

i-i 

< E[g(x) -g{x®Y)\A]=g i -Y J f;9j 

and the statement follows. □ 

Corollary 10. Let us consider the (1+1) EA C for some constant c > 1 
and let g be a universal linear drift function with monotone weights. For 
k := [~^] and £ € {1, n — k} it holds that 



g k+ i > (i + i 



Proof. We show the claim via induction with respect to I. By definition, 
9k+i > 9i = 1- Now, for I > 1, Proposition 9 and the induction hypothesis 
yield 



j=l j=k+l j=l n 

and again the statement follows. □ 
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We now prove an upper bound for the sum of the weights of g. 

Proposition 11. Let c a constant and let p := c/n. If we consider the (1+1) 
EA C and if g is a linear universal drift function with monotone weights, it 
holds that Sj=i 9j ^ 1+n P~P _ 

Proof. Let / = OneMax and x = e\. Then, f(x) = 1 and the event 
f(x © Y) < f(x) occurs if and only if 

n 

f(x © Y) = (x © Y)j < 1 . 
j=i 

Therefore, let us denote by A be the event that Y = e±, by Bj the event 
that Y = ei ffie^ for j > 1. Finally, let us denote by C the event that 1" = 0. 
Then, 

n 

X(f(x ®Y)< f(x)) = X (A) + x{Bj) + X(C) . 

3=2 

Thus, 

E[A C ( 5) /, a;)] = E[g(x) - g(x © Y) \ A] ■ PrL4] 

n 

+ ^E[ 5 (x)- 5 (x©y) j^-Pr^-] 

3=2 

+ E[<?(a;) -p(xffiF) | C]-Pr[C], 

the latter summand equaling 0. Now, 

E[g(x) - g(x © Y) \ A] = gi , 
E[g{x) - g(x © Y) | Bj] =g x - gj , 
Pt[A] = (1 -p) n ~ l p, and 
Pr[Bj] = (1 -p) n "V. 

Since g is a drift function for / we have E[A c (g, /, x)] > 0. Hence, 

n 

0<ElA c (gJ,x)} = (l-p) n - 2 p((l-p)g 1 + P Y,(9i-g j )) , 

3=2 

yielding 

n 

< (I - p)gi + pJ2(9i ~ 9j) ■ 

i=2 

By resorting we have 

n 

vY g j - ( x + ( n ~ = 1 -f + n P> 

3=1 

which concludes the proof. □ 
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The upper and lower bounds for the weights of g now allow us to prove 
Theorem 8: 



Proof of Theorem 8. We need to prove c < 2.2. Let us again abbreviate 
p := c/n. Foi 
and 11 yield 



p := c/n. For the purpose of better readability let k := \-~\. Propositions 9 



l-p + np ^ , "A 1 , ■ (l+p) n k -l 

>J>i > k-l+ Yl (1+P) l = k+ K 



p U 1=5 p 

> (i + p) n ~ fc 
p 

Thus, 

l>(l+p)"- fc +p(l-n). (4) 

By re-substituting with c/n, the term on the right-hand side can be 
bounded from below by 



(1 + + £ _ c = (1 + eW 1 "") + c _ c 

V n/ n \ nJ n 

= e c - 1 (l-o(l))-c 

For sufficiently large n and c > 2.2, this term exceeds 1, which contradicts 
inequality (4). Hence, c < 2.2. □ 

4.2 Additive Setting 

We now transfer the results obtained in the previous subsection to the ad- 
ditive setting. That is, we are interested in the question "Can we find a 
linear function g such that ln(l + g) serves as a drift function for all linear 
functions (with monotone weights)?". We can apply the methodology of the 
previous propositions to show that such a linear universal function g does 
not exist if the mutation probability c/n is large. 

We do not try to find the best possible constant, but prefer to use the 
simple approach obtained via the multiplicative drift in the previous subsec- 
tion. We then use a numerical example to show that linear universal drift 
functions do not exist if n = 100 and c > 4. 

Theorem 12. Let n = 100 and, c > 4 and let us consider the (1+1) 
EA C . For every linear function g : {0, 1} — > R, x i— > Y^j=i9j x 3 w ^h weights 
1 = gi < ■ ■ ■ < g n there exists an x € {0, l} n and a linear function f with 
monotone weights such that we have E[A c (ln(l + g), f,x)] < 0. 

We are going to use the tools that we have just developed for the multi- 
plicative setting. Thus, we again apply contraposition. Therefore, let us fix 
some function g : {0, 1} — > R, x h- > Y^ 1 j=x9j x j with 1 = gi < ■ ■ ■ < g n such 
that E[A c (ln(l + g), f, x)] > for all / and all x as in the statement. 
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Proposition 13. Let c be a constant, let p := c/n and let us consider 
the (1+1) EA C . If g is a linear function with monotone weights such that 
E[A c (ln(l + g),f,x)\ > for all f and x as in Theorem 12, the following 
holds. For every i € [n], we have 



i-1 

ln(l + 9i ) > max ( m(2), £ (1 - pf^pi ln(l + j)) 

i=i 

Proof. For fixed i, let / = BinVal and x = e^. As in the proof of Propo- 
sition 9 let Y denote the mutation vector and let A be the event that 
f(x © Y) > f(x). Then A occurs if and only if Yi = 1 and Yj = for 
all j > i. 

<E[A c (ln(l + 5 ),/,x)] <E[ln(l + g(x))-ln(l+g(x(BY)) \ A]Pt[A], 

where PrLA] has shown to be positive in the proof of Proposition 9. There- 
fore, < E[ln(l + g(x)) - ln(l + g(x © Y)) \ A}. 

Given that Yj, = 1 and Yj = for all j > i the first i — 1 bits are subject 
to independent, random mutation with mutation probability p. Thus, the 
probability that k < i — 1 of the first i — 1 bits flip equals (1 —py~ 1 ~ k p k . 
Thus, considering the fact that gj > 1 for all j € [n], we obtain 

i-1 

< ln(l + g t ) - Cf) (1 " P) i_1 -V Ml + J) ■ 

□ 

Proposition 14. Lei c be a constant and let us consider the (1+1) EA C . 
Let g be a linear function with monotone weights such that E[A c (ln(l + 
g)ifi %)] > for all linear functions f and every x S {0, l} n . If we set 
p := c/n, then 

± H i + 9j )<H2) 1+p{n - 1) . 

Proof. Like in Proposition 11, let / = OneMax and x = e±. Then, the 
same arguments used there yield 

n 

< (1 - p) ln(l + 91 ) + pY, ( ln(l + gi) ~ HI + 9j )) . 

J'=2 

The statement follows from resorting and g\ = 1. □ 
Proof of Theorem 12. Propositions 13 and 14 yield 

n i—1 n 1 

E E (?) U " ^"'"V ln ( X + i) ^ E ln ( X +9i)< M2) +nP ~ P ■ 

i=l j=l j=l P 
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We use Maple to compute that for p = the term on the left is bigger 
than 91, whereas the term on the right equals 1241n(2) < 86. □ 

Note that we could improve the constant c > 4 in the previous proof if 
we took into account that for every j £ [n] we have that ln(l + gi) > ln(2). 
But, as written above, we do not elaborate this idea any further. 



4.3 Distribution-Based Multiplicative Drift 

A natural question to ask is whether the distribution-based approach of 
Jagerskiipper [Jag08] and in particular the application of Theorem 5 does 
help. 

We show that this is not the case. More precisely, we show that there ex- 
ist probability distributions satisfying the requirements of Theorem 5 which 
do not allow universal drift functions for c > 7. 

To formulate this statement rigorously, we introduce the notion of 
A c (f,g,T>). In the style of definition 7, it denotes the change in the poten- 
tial function g of the (1+1) EA C minimizing the function / with individuals 
distributed according to distribution V. 

Definition 15 (A c (f,g,V)). Let n € N and let c be a constant. Moreover, 
let f and g be two functions on {0, 1}™ and T>: {0, 1}™ — >• [0, 1] be a prob- 
ability distribution on {0, l} n . Finally, let x G {0,1}" be drawn according 
to T> and let y £ {0, l} n be sampled by flipping each bit of x independently 
with probability c/n. Then the random variable A c (f,g,T>) is defined as 



A C (/, 5 ,D) 




g(y) iff(y)<f(x), 

otherwise. 



We now show that in the setting of Theorem 5 linear universal drift 
functions do not exist for c > 7. 

Theorem 16. Let n € N be sufficiently large, c > 7 a constant, and 
let g: {0,1}™ — > M. be a linear function with monotone weights 1 = g\ < 
• • • < 9n- Then there exist a linear function f : {0, 1}™ — > K and a probabil- 
ity distribution V: {0, 1}™ — > [0, 1] with 

Pru[xi = 0] < ••• <Prv[x n = 0] (5) 

such that E[A C (/, g, T>)] < 0. 

The rest of this section is devoted to the proof of the previous theo- 
rem. For this purpose, we consider the following collection of distributions 
on {0,1}™. 
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Definition 17 (Distributions V k on {0,1}™). Let n G N and k G [n]. We 
define a distribution T>k : {0, l} n — > [0, 1] by setting for all x G {0, 1}™ 



2>fc(a0 := 



1/k if x = &i with i G [fc] , 
otherwise. 



Let n G N be sufficiently large and assume that Theorem 16 does 
not hold. Then there exist a constant c > 7 and a linear function 
g : {0, l} n -> I,i 4 Y^ 1 j=i9j x j w hh weights 1 = g± < ■ ■ ■ < g n such 
that E[A C (/, g, T>k)] > for all linear functions / and for every k G [n]. 

The following Proposition gives an upper bound for the sum of the 
weights of g. It is a direct consequence of Proposition 11 for T> = T>\. 

Proposition 18. Let c be a constant. If g is a linear universal drift function 
with monotone weights such that E[A C (/, g, > for all linear functions 
f and for every k G [n], then Y27=l 9i — n ~ ^ + n l c - 

Proof. The proof is similar to the one of Proposition 11. Let / = OneMax. 
As T>\ is the simple distribution with fifei] = 1, we immediately have x = e\ 
if x is sampled from {0, l} n according to T>\. Furthermore we have required 
that 1 < g\ < ... < g n . That is, we are in exactly the same situation as in 
the proof of Proposition 11 and conclude the proof as elaborated there. □ 

A lower bound of the weights is given by the following result. 

Proposition 19. Let c be a constant and let g be a linear function with 
1 = g± = min Jg [ n ] gj and E[A C (/, g, T>k)] > for all linear functions f and 
for every k G [n]. Furthermore, let s = min{i G N | (1 — cjnf < 1/2}. For 
all k G [n] it holds that 

gk^k-s-^+gk.^s + l-2^). 

Proof. Let k G [n]. We set / := BinVal. Let x be sampled from {0, l} n 
according to Let Y G {0, 1}™ with Yj = c/n independently for all j G [n]. 
Again we abbreviate p := c/n. By the definition of T>k we have for all i < k 
that Pr[x = ei] = 1/k. Thus, 

0<E[A c (f,g,V k )] 

k 

= Yl ^[f(^ ® Y )< fid)] E l9(ei) - g(ei Y) \ f{e t (BY) < /(e,)] . 

i=l 

As outlined in the proof of Proposition 9 it holds that /(e^ (BY) < /(ej) 
if and only if either Y = -in which case g(ei) — g(ei (BY) =0- or if both 
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Yi = 1 and Yj = for all j > i - in which case /(e, © V) < f(et). Thus, 
k 



< E l Pr if( e i © y ) < /(<*)] E b(ei) " 9(ei © F) I f(e l © F) < /(e;)] 

8=1 
1 - 

i=l 

As in the proof of Proposition 9 we obtain that 



i-l 

E[ 5 ( ei )-3(eiffiF) |/(ei©y)</(ei)] = (ffi-pE* 

5=1 

Putting everything together we have 

k i-l 



1 

jfe" 



t=l j=l 

fc-1 fc 

i=l j=i+l 



K — l K 

^£*(* E a - p) h ~ 3 - a - p) h - 



Multiplication by kp 1 {l—p) k n and sorting yields 
fe-1 k 

i=l j=i+l 
fe-1 fc— i-l 

2>( P e (i-p^-a-p)* 

i=l j=0 

g a (p 1 -"- rt '°" -(i-ri- 

i=l ^ 
fe-1 



= 5>(l-2(l-p)*-*). (6) 

i=l 

By definition of s the summands in (6) are positive if and only if k — i > s. 
Thus, we can split the sum into a positive and a negative part. This yields 

k—s k—l 

9k > J>(1 " 2(1 + E 9i0- ~ 2(1 -p)*"'). 

i=l i=k— s+1 

We now make use of the fact that 1 < Pi (on the left-hand side) and that 
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the Qi are monotonically increasing (on the right-hand side). This yields 



k—s k— 1 

g k > (1 - 2(1 - P ) k ~ l ) + £ Sfc-iCl - 2(1 - p) fe ~*) 

j=l i=k— s+1 

fc-1 s-1 

= i- S -2^(l-p)' + 9w ( S -l-2^(l-pf" ! ) 

= *---2(^-^)+^- 1 (l-l-2(^-l)) 
= fc - S - 2*((1 - p) s - (1 - pf) + 5jfe _!( s + 1 _ 22(1 - (1 - p) s )). 

Together with (1 — p) s — (1 — p) k < \ and (1 — (1 — p) s ) < 1, we obtain the 
desired g k > k - s - f + g k ^(s + 1-22). □ 

Note that (1 — ^) n ^ c < eT 1 < \, which, by definition of s, results in 
s < - • Hence, s + 1 — 2- < 0. Therefore, the lower bound of g k provided 
by Proposition 19 is better, the smaller g k -i- On the other hand, we know 
that the weights gi of g are increasing in i. The idea of proving Theorem 16 
is simply to use the better of the two estimates for one particular g k . 

Proof of Theorem 16. We use the upper and lower bound for Yl?=i9i 
tained in the previous two propositions and show that they contradict each 
other for c > 7. To this end, let us abbreviate £ := [6-] and make a case 
distinction with respect to the size of gg. First, let us assume that gg < 2. 
Recall that s + 1 — 2- < 0. We can thus apply Proposition 19 to obtain 

gi+i > Q~ c + 1 - s - I + 2(s + 1 - 22) > a + s + 3. 

Due the fact that gi > 1 for all i, we can bound the sum of the weights of g 
from below by 



X> > (n-l) + 2 + s + 3 . 



i=i 



But this inequality contradicts the upper bound Y17=l 9i — ( n — -'-) + t 
obtained in Proposition 18. 

Therefore it must hold that gi > 2. In this case, the monotonicity 
condition of the weights g\ < . . . < g n implies that gi > 2 for all i > I. By 
definition we also have gj > 1 for all j S [£ — 1]. Thus, 

n 

Y,9i> L6tJ+2(n-L6fj)>2n-62, 
i=i 

again contradicting Y17=l 9' 1 — ( n — ^) c ^ or c — ^' ^ 
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5 Conclusion 



In this work we considered the state-of-the-art proof techniques for analyz- 
ing the runtime of the (1+1) EA optimizing linear functions. We found 
that both the classical proof via additive drift as well as the more recent 
multiplicative method stop working for mutation probabilities beyond c/n, 
where c is a small constant. This problem cannot be solved by defining the 
weights gi of the drift function differently — we have shown that for any 
choice of g there is a linear function / such that the drift E[A(/, g, x)] is 
negative for some search point x. 

We also showed that also the Jagerskiipper method fails for mutation 
probabilities larger than 7/n. This raises the question how the current, 
generally very successful drift methods can be used with larger mutation 
probabilities. 

As can be easily seen, we did not put too much effort in optimizing the 
constants c. Although we do not know the minimum value of this constant, 
we find that already the presented values are frighteningly close to the most 
commonly used mutation probability of 1/n. 

A more challenging problem arising from this work, naturally, is to find 
methods that work for mutation probabilities larger than these barriers. As 
our analysis shows, here either the drift function has to be chosen individ- 
ually for each objective function, or different classes of drift functions than 
those regarded by us have to be used. Both might, though, again lead to 
tedious calculations. 

Note added in proof: Indeed, at the recent PPSN conference Doerr 
and Goldberg [DG10] managed to prove the 0(ralogn) bound for arbitrary 
c/n mutation probabilities by defining a drift function for each linear ob- 
jective function / and each constant c. This construction, however, is quite 
technical. 
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